Global method for mapping property spaces

ABSTRACT

A computer based technique for the global investigation of property spaces is described. More particularly, a technique is described for the global investigation of chemical space, with reference to drugs, using a system comprised of molecules and descriptors that allows systematic mapping of the chemical space. The technique is used to generate a map of N-dimensional chemical space that allows one to examine, in a consistent manner with existing tools, the inner relationship between various molecules. The technique therefore allows one to investigate unexplored regions of the chemical space, avoiding redundancy, generating compounds with similar chemical information content, or to focus on subsets of the chemical space that are relevant to drug-like structures, without additional errors due to extrapolation.

The present invention relates to a computer based technique for theglobal investigation of property spaces. More particularly, but notexclusively, the present invention relates to a computer based techniquefor the global investigation of chemical space, with reference to drugs,using a reference set of molecules and descriptors that allows thesystematic mapping of the global chemical space. The technique, may beused, for example, to generate a global map of the multidimensionalchemical space that allows one to examine, in a consistent manner, theinner relationship between various molecules.

Drug discovery is a time and resource consuming exercise. Currentcomputer based tools allow the description of chemical spaces as localmodels. Many chemical fields have been targeted by approaches used, orproposed, to discover new chemicals. For example, pharmaceuticals,agrochemicals, cosmetics and perfumes, photographic materials and othershave benefited from the methodology developed to assist chemicalsynthesis. In all these fields, central to the goal of discovery is thenovelty of chemical structures, and the novelty of chemical properties.With the advent of parallel synthesis and combinatorial chemistry, largenumbers of chemical compounds are now within reach for synthesis andevaluation. Crucial for the practising chemist remains the goal ofprioritising, out of thousands or millions of possibilities, whichcompound to make next.

For a pharmaceutical or agochemical compounds, there are two main typesof relevant information in a molecule, i.e. chemical and biological.Medicinal chemistry handles chemical information by identifying classesof “active molecules”, then zooms in on biologically relevantinformation by performing various bioassays. However, in the initialstages of a research project, where little or no information isavailable concerning the biological target, chemical information is theonly property that one can handle appropriately. Increasing the chemicalinformation known about each compound becomes a goal of such early-phaseprojects, especially in the absence of active compounds.

“What's the best way to describe a molecule numerically and uniquely?What's the best way to categorise clusters of molecules? Is all thiswork producing results that are any better than plain old randomselection?” These questions are quoted from an article by Elizabeth K.Wilson, “Computers customise combinatorial libraries”, published Apr. 271998 in Chemical & Engineering News (pp. 31-37). This article summarisesthe issues discussed at the recent American Chemical Society “DiversitySymposium”, organised by Robert S. Pearlman in Dallas, Tex. According tothis article, the issues of molecular diversity, and of describingchemicals in an unique and relevant manner, have not been resolved.There is no general consensus as to which approach should be taken.

Molecular similarity is an ubiquitous concept that originates from theXIXth century. Attempts to rigorously define molecular similarity can befound in the book “Concepts and applications of molecular similarity”,edited by Mark A. Johnson and Gerald M. Maggiora, J. Wiley & Sons, ISBN0-471-62175-7, 1990. The impact of molecular similarity in the field ofdrug design, and a survey of recent advances of using molecularsimilarity in the pharmaceutical industry have been the aim of the book“Molecular similarity in drug design”, edited by Philip M. Dean, Chapman& Hall, ISBN 0-7514-0221-4, 1995.

Molecular diversity has been the target of recent molecularsimilarity-based methods, in the effort to maximise the structuraldiversity of combinatorial and/or HTS libraries, so as to ensure thelargest possible coverage of the chemical space. Molecular diversityanalysis methods are surveyed in volume 7/8 of Perspectives in DrugDiscovery and Design, ISSN 0928-2866, “Computational methods for theanalysis of molecular diversity”, edited by Peter Willet (1997).

Presently, tools to describe chemical space are used to generate localmodels. For example. Sergio Clementi and co-workers have described theprincipal properties space for a set of 40 heteroaromatic compounds inQuant. Struct.—Act. Relat. Vol. 15, pp. 108-120 (1996). In their work,Clementi et al. calculated various properties for 45 compounds runningthe GRID-programme, which is based on three-dimensional descriptors. Outof the resulting calculations, they have derived a set of principalproperties, and have classified these compounds into ten clusters.However, they classified guanine, a biologically importantheteroaromatic ring, as an “outlier” that falls outside the propertyspace of the aforementioned GRID descriptors. One of the inherentlimitations of local models is that the validity of the analysis is onlyas good as the dataset composition, and unique features are reflectedinto “outliers” which often tend to skew the statistical results andare, therefore, excluded from the analysis.

Furthermore, local models tend to be outdated, as new data aregenerated. This is illustrated by work performed by Svante Wold andco-workers where an initial three-dimensional property space for the 20natural amino acids, J. Med. Chem., vol. 30, pp. 1126-1135 (1987), wasextended to a five-dimensional set for 55 amino acids, Quant.Struct.—Act. Relat., vol. 8, pp. 204-209 (1989). Recently, this wasfurther extended to a set of 87 amino acids, still using afive-dimensional property space, and published in J. Med. Chem., vol.41, pp. 2481-2491 (1998). The 5 principal properties derived for aminoacids are similar to the Hammett and Taft parameters, widely used inphysical organic chemistry textbooks to correlate physico-chemicalproperties with molecular structures. These properties, termed“Z-scales”, have been tentatively interpreted as measures oflipophilicity (z1), size/polarizability (z2), polarity (z3), while thefourth and fifth scales (z4 and z5) were more difficult to interpret.This work has extended the principal property space represented by thetwenty natural amino acids with an additional set of 67 non-coded aminoacids, some of them explicitly synthesised to cover unique properties.However, these Z-scales remain valid only for amino acids, and furthersynthesis of novel structures would lead to revaluation of the principalproperties, and of the “Z-scores” for individual amino acids.

Current computer based technology allows the end-user to generate, insilico, extremely large numbers of compounds. For example, Tripos Inc.and Silicon Graphics have announced that they in a joint project havecreated a virtual library consisting of 100 billion molecules, using a“SpaceCrunch” technology.

Tripos' software ChemSpace™ yields “all possible molecular productsresulting from given reactions, allowing chemists to start travellingwith confidence over large expanses of the chemical universe”. Thissoftware promises a structural description of the chemicaluniverse/space, based on single compounds and within certain limits.Chemspace™ is a searchable database consisting of billions of compoundssynthesizable from known reactions and available reagents. This methodincludes tools to navigate in the database. However, the databaserepresents only a subset of the chemical space, limited by the type ofchemical reactions and reactants provided in “SpaceCrunch”. Thisstepwise manner to map chemical space has been, so far, the onlyalternative to true chemical space navigation.

From all the above, one can observe that there is a considerable need tonavigate in the chemical space.

The present invention addresses the disadvantages discussed above andallows one to generate a global model that includes, and canspecifically analyse, (e.g. heteroaromatic compounds (vide infra))without the risk of extrapolation or outlying behaviour, given that theraw data are correct.

It is an object of the present invention to provide a computer basedmethod to investigate any property space, e.g. a chemical and/orbiological space, based on a set of objects (structures), e.g. chemicalcompounds and/or biologically relevant observations, and a set ofvariables, e.g. chemical descriptors and/or biologically relevantparameters, that allow a global systematic description of that givenproperty space.

It is a further object of the present invention to provide a computerbased method to investigate the chemical space in a global manner, thusavoiding redundancy and providing ways to explore novel regions of thechemical space, without the need for extrapolation.

Viewed from one aspect the present invention provides a method ofmapping a target object of a target type into a target hyper-volumewithin a model in N-dimensional space containing a plurality of objects,each object in said model having an associated set of variables definingits position within said N-dimensional space, each variable having amaximum and minimum value within said model, said method comprising thesteps of:

storing core object data representing a plurality of core objects ofsaid target type within said target hyper-volume, said targethyper-volume being positioned spaced away from said maximum and minimumvalues of said variables;

storing satellite object data representing a plurality of satelliteobjects not of said target type positioned outside of said targethyper-volume;

determining from characteristics of said target object a position ofsaid target object within said hyper-volume using the same evaluationcriteria as used for said core objects and said satellite objects;

positioning said target object within said model relative to said coreobjects and said satellite objects in accordance with said determinedposition; and

generating a user output indicative of said relative position of saidtarget object.

The invention recognises that when seeking to map a target object into atarget hyper-volume, improved results can be achieved if the model beingused includes not only core objects within the target hyper-volume butalso satellite objects positioned outside of the target-hyper volume.Whilst the satellite objects may be very different to the target objectsof interest, the presence of the satellite objects within the modelprovides the model with a much higher degree of generality and theability to cope with target objects that are relatively different fromthe core objects. In contrast to the global model allowed by theinvention, a local model of the type discussed above has the ability tocope with the target objects that are of a similar nature to the knownobjects within the model but is ill-equipped to provide meaningfulresults when the target object becomes relatively different from theobjects already within the local model. For this reason, local modelsare limited by the type of input data, and are not suitable for themapping of different types of target object. Furthermore, a relationshipbetween different objects that may be identified with a global model maynot be found when those objects are separately modelled within theirlocal individual models.

The present invention seeks to avoid the problems of the prior art byexplicitly including molecules with extreme properties in the dataset.These molecules with extreme properties play the role of satellites andallow the principal property values to remain fixed during the analysis.Thus, the present invention provides a consistent method to map thechemical space, not only for amino acids or heteroaromatic compounds,but for any type of chemical compounds considered within the set ofconventions described below.

From a mathematical perspective, the use of satellite objects within themodel but outside of the target hyper-volume has the advantage ofproviding a more flexible and globally representative set of unitvectors defining the N-dimensional space against which a particulartarget object may be mapped. It is surprising that mapping of a targetobject into a target hyper-volume, e.g. a potential pharmaceutical intothe hyper-volume of known pharmaceuticals, is improved by deliberatelyincorporating satellite objects within the model that have a verydifferent character to the target objects of interest within the targethyper-volume. One way of understanding this improvement is to view thesatellite objects as providing the ability to interpolate the positionof the target object within the model whereas a local model may requiremuch less accurate extrapolation of the position of a target object ifthat target object is not very similar to the objects already within thelocal model.

It will be appreciated that the modelling technique of the inventioncould be applicable to many different fields. However, the invention isparticularly well suited to models in which the objects are chemicalstructures and the variables are chemical variables. More particularly,the technique is highly beneficial when the target type ispharmaceutically active chemical structures and the core objects includeknown pharmaceuticals whilst the satellite objects are notpharmaceutically active.

In order to derive the unit vectors representing the component axiswithin the N-dimensional space, it has been found beneficial to useprincipal component analysis to determine eigen-vectors to serve asthese component unit vectors. Principal component analysis provides away of identifying the best vectors for representing an N-dimensionalspace without redundancy that would introduce undesirable complexity.

A target object to be mapped will also be subject to principal componentanalysis in the sense that it will be positioned within the model usingthe value of its co-ordinates in the N-dimensional space whose componentunit vectors are determined using principal component analysis.

If a target object is found to lie outside of the target hyper-volumethen it is sometimes useful to add that target object to the model toserve as a satellite object. Whilst the position of the target objectoutside of the target hyper-volume makes it more difficult to interpretits relationship with other objects within the model, its addition tothe model can to have the advantage of improving the degree of globalapplicability of the model and may also serve to indicate a relationshipwith some future target object to me mapped to the model.

Many different variables and maximum and minimum values can be chosenfor the model. However, in the case of a model seeking to identifypharmaceuticals, particularly useful properties are molecular weight,molecular size, molecular flexibility, molecular rigidity, formalnegative charges, formal positive charges, the ability to accepthydrogen bonds, the ability to donate hydrogen bonds, lipophilicity andatomic polarisabilities as described by variables related to theaforementioned properties (vide infra). It will be appreciated thatdifferent combinations of these variables can be used in combinationwith other variables if so desired. Calculated molecular refractivityand molecular volume may also be used as alternatives to or in additionto molecular size.

Viewed from another aspect the present invention provides an apparatusfor mapping a target object of a target type into a target hyper-volumewithin a model in N-dimensional space containing a plurality of objects,each object in said model having an associated set of variables definingits position within said N-dimensional space, each variable having amaximum and minimum value within said model, said apparatus comprising:

a memory for storing core object data representing a plurality of coreobjects of said target type within said target hyper-volume, said targethyper-volume being positioned spaced away from said maximum and minimumvalues of said variables and for storing satellite object datarepresenting a plurality of satellite objects not of said target typepositioned outside of said target hyper-volume;

determination logic for determining from characteristics of said targetobject a position of said target object within said hyper-volume usingthe same evaluation criteria as used for said core objects and saidsatellite objects;

positioning logic for positioning said target object within said modelrelative to said core objects and said satellite objects in accordancewith said determined position; and

a user output device for generating a user output indicative of saidrelative position of said target object.

Viewed further a further aspect the invention provides a method offorming a model in N-dimensional space containing a plurality of objectsand a target hyper-volume into which target objects are to be mapped,said method comprising the steps of:

selecting a set of variables defining said N-dimensional space;

selecting maximum and minimum values for said variables;

selecting a representative set of core objects within said targetvolume;

selecting a representative set of satellite object outside of saidtarget volume; and

iteratively testing and altering said model to obtain a set ofvariables, maximum and minimum values, core objects and satelliteobjects that span said N-dimensional space and allow target objects tobe mapped to within said target volume.

Viewed from a still further aspect the invention provides a carriermedium carrying a computer program product for mapping a target objectof a target type into a target hyper-volume within a model inN-dimensional space containing a plurality of objects, each object insaid model having an associated set of variables defining its positionwithin said N-dimensional space, each variable having a maximum andminimum value within said model, said computer program product providingthe processing steps of:

storing core object data representing a plurality of core objects ofsaid target type within said target hyper-volume, said targethyper-volume being positioned spaced away from said maximum and minimumvalues of said variables;

storing satellite object data representing a plurality of satelliteobjects not of said target type positioned outside of said targethyper-volume;

determining from characteristics of said target object a position ofsaid target object within said hyper-volume using the same evaluationcriteria as used for said core objects and said satellite objects;

positioning said target object within said model relative to said coreobjects and said satellite objects in accordance with said determinedposition; and

generating a user output indicative of said relative position of saidtarget object.

It will be appreciated that the carrier medium for carrying the computerprogram could take many different forms. Examples of carrier mediainclude magnetic discs, optical discs, memory integrated circuits andthe like, but also include distribution media such as distribution via atelecommunications system, e.g. the downloading of computer softwarefrom a telecommunications medium such as the internet.

Embodiments of the invention will now be described, by way of exampleonly, with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1 and 2 are example portions of global maps of chemical space;

FIG. 3 is an example of a local map;

FIGS. 4, 5 and 6 are flow diagrams illustrating the operation of oneembodiment of the invention;

FIG. 7 schematically illustrates a computer for performing modelling.

As used herewithin, the following words and expressions are intended tohave the following meaning:

anchor object:objects situated at the corners of a region of the globalproperty space that is user-defined, for example those that can bedefined by the “Pfizer's rule of 5” for the oral absorption of drugs.

convergent: tending to move toward one point or to approach each other

core: the central, innermost, or most essential part of anything.

core object: an object that is positioned globally inside thehypervolume of interest, e.g., an object structure situated inside thechemical space of drug-like compounds, where this core object possessesaverage and/or typical properties related to the drug-like space and/orthe object is a marketed drug structure.

global: covering the whole of a group of items, where the group of itemscould be, e.g., all the organic chemicals composed of C, H, N, O, S, Pand halogens with a molecular mass of maximum 1500.

satellite: anything that depends on, accompanies, or serves somethingelse.

satellite object: an object that is positioned globally outside thehypervolume of interest, e.g., a satellite structure situated outsidethe chemical space of drug-like compounds, where this satellitepossesses extreme properties related to the aforementioned drug-likestructures.

Briefly, the process of forming and using a model can comprise thefollowing steps:

choosing and/or defining the property space

defining the extreme values for the set of relevant properties

choosing a representative set of satellite and core objects

obtaining a convergent set of rules

modelling

outputting a map

The following describes a computer based method for global navigation inproperty spaces, particularly in the chemical space, hereinafterreferred to as ChemGPS (Chemical Global Property System). ChemGPS allowsone to investigate unexplored regions of the chemical space, avoidsredundancy, generates compounds with similar chemical informationcontent. or focuses on subsets of the chemical space that are relevantto drug-like structures, without additional errors due to extrapolation.

Current tools allow description of chemical spaces as local models,based on (i) a set of existing and theoretical molecules, (ii) a set ofchemical descriptors and (iii) multivariate analysis. We havesurprisingly found a computer based method that maps the global chemicalspace whilst allowing the reuse of many existing tools. In particular,the present system makes it possible to globally investigate thechemical space that is relevant to drug-like compounds. This isaccomplished using a model that includes a set of rules and a set of“satellite” molecules, i.e. molecules having extreme properties, thatare intentionally placed outside the property space of interest (targethyper-volume), along with other chemical structures that would notnormally be deemed relevant for this problem. The chemical descriptorsthat are used may, or may not, be relevant to biological space.

The present technique in general allows one to systematically andconsistently investigate any property space, including the chemicalspace.

The purpose of the present example technique is to overcome thelimitations of the earlier approaches and to generate a global model ofthe chemical space that is relevant to drug-like compounds, usingmapping convention, hereby termed chemography. Analogous to the Mercatorconvention, widely used in geography, chemography consists of a set ofconventions used to navigate in chemical space. These conventionsinclude a set of filtering properties, and a list of molecules that arechosen to cover these properties.

For the drug-related chemical space, these filtering properties include,but are not limited to: size and molecular weight, flexibility,rigidity, negative and positive formal charges, the number of hydrogenbond donors and acceptors, lipophilicity and polarizability. Otherfiltering properties that may be relevant, and also within the scope ofthe present technique, are pKa, logD and pharmacophoric patternsrelating chemicals to a given receptor, but also chemical fingerprintssuch as those obtained using the Daylight CIS software or the Tripos.Inc software.

For drug-like molecules, the chemical space can be set by the followingrules (maximum and minimum values for variables: molecular weight below1500 daltons; calculated octanol/water partition coefficient between −20and +20; up to 50 non-terminal rotatable bonds; up to 50 hydrogen bonddonors and acceptors; up to 6 net formal charge units (−6 to +6); and upto 30 rigid centers. For the purpose of illustration, the chemical spacecomprises only the following elements: carbon, hydrogen, oxygen,nitrogen, sulfur, phosphorus and halogens. The above rules are used tofilter the type and kind of compounds included. However, they do notrepresent the final set of descriptors, since the chemical space can beextended.

For example, the definition of the chemical space relevant to drugproperties can be extended by including other chemical elements(transition metals, boron, alkaline metals, etc.), and/or otherproperties referring to descriptors such as pKa, LogD, and/orSpartan-calculated descriptors, and also by including methods thatdescribe chemical fingerprints like those from Daylight or Tripos, andISIS keys, and are all within the scope of the present technique.

Two types of molecules (objects) are also considered as an integral partof the present technique: “core” and “satellite” objects. Core objectscomprise a representative set of e.g. pharmaceutical and/or any otherrelevant chemicals, chosen according to the filtering properties.Satellite objects, that represent a crucial part of the presenttechnique, are molecules chosen to represent extreme filteringproperties, and are intentionally placed outside the drug-relatedproperty space (target hyper-volume).

The final step of the method refers to the use of the present techniquefor mapping new objects. This includes estimation of chemical propertiesvia computer calculation of descriptors and subsequent PCA of any set ofobjects. Derived from multivariate analysis, the new set of latentproperties is consistent for each compound, and is valid within therules set by the afore-mentioned chemographic convention. The map isillustrated via score plots for each significant latent property, whereeach score is obtained using prediction based on the converged ChemGPSmodel. The ChemGPS method allows one to avoid the limitations of localmodels, and provides the tools to navigate in the chemical space viainterpolation, in contrast to “external prediction”, which is performedvia potentially unreliable extrapolation from local models.

For the purpose of illustration, we provide the following list ofsoftware methods to calculate descriptors relevant for chemicalproperties: Sybyl (from Tripos Inc, St.Louis, Mo.); Spartan (fromWavefunction, Irvine, Calif.); TSAR (from Oxford Molecular, Oxford,England); GRID (from Peter Goodford, Oxford, England); Cerius2 (fromMolecular Simulations, Inc., San Diego, Calif.); Moloconn Z (HallAssociates Consulting, Quincy, Mass.); CLOGP (Biobyte, Claremont,Calif.); ACDLogD (ACD Labs Inc, Toronto, Canada); Hybot, 5.0, Raevsky &Grigorev, Moscow, Russia etc. Based on the above list of descriptors,one can perform data compression using artificial neural networks (witha number of methods, e.g., the Kohonen self-organizing map), and/or datareduction using principal component analysis (PCA), by ways of using theSIMCA software (from Umetri, Umeå, Sweden), or Unscrambler (from CAMOAS, Oslo, Norway), or Matlab (Comsol, Stockholm, Sweden). The preferredway of the invention is to perform data reduction using PCA, asimplemented in the Simca package.

The present technique represents a chemographic system that, forexample, can be applied to the pharmaceutically relevant chemical space.It could, however, be applied to any other part of chemistry, e.g.agrochemicals, perfumes, dyes and pigments, polymers, etc., byappropriate use of the filtering properties that are relevant to thatparticular field of chemistry, and by an appropriate choice of core andsatellite objects that are relevant to that area of chemical space.Satellite objections that are relevant to a certain region of thechemical space need not be informative for other regions.

The present technique can be viewed as a computer based method togenerate a global model of any property space that is deemed ofinterest, and can be implemented in a so plurality of property spaces.By using a set of rules and objects, described above for the chemicalspace as “chemography”, one can generate a set of filtering propertiesand a list of satellite and core objects that best describe thatproperty space. Illustratively, the property space may be the chemicalspace relevant to drug-like compounds. The present technique can thusprovide a powerful tool or method for navigating in the chemical spacethat includes any desired physical and/or chemical properties, but hasparticular utility for the discovery of pharmaceutical drugs. Referringto the filtering properties that are relevant to the drug space, a listexemplifying such properties is given below:

M.W.: This descriptor is the molecular weight of each compound,expressed in daltons.

SIZE: This descriptor is a rough estimate of molecular size, and adds 1point for every four non-hydrogen atoms. This is based on the roughestimate that a 500 dalton compound has 40 heavy atoms. This complementsM.W., since third (or higher) period elements contribute significantlyto M.W., without a similar increase in the size of the molecule.

FLEXI: This is a count of all non-terminal rotable bonds and/orrepeating units such as (C═C—C). The molecule may include rigid unitspositioned between the non-terminal rotatable bonds. This descriptoralso adds “n” for each non-rigid ring with n=N−4−x/2, where N is thenumber of bonds in a ring, x is the number of atoms in other rings(spiro counts as 1). Amides and esters are ignored.

RIGID: This is a count of all rigid structures such as rings, amides,ester groups, carboxylates, nitro groups, etc. Additionally, 1 point isgiven for each fused ring.

QMINUS: This is a count of all formal negative charges only, such ascarboxyl groups, sulfate groups, phosphate groups, e.t.c, and all othergroups that can be deprotonated at physiological pH.

QPLUS: This is a count of all formal positive charges only, such asamines, amidines, guanidines, e.t.c., and all other groups that can beprotonated at physiological pH.

HBACC: This is a count of all oxygen and nitrogen atoms that can accepthydrogen bonds.

HBDON: Counts Q—H and N—H moieties only. Carboxyl groups are ignored,because they are ionised.

LIPO: This descriptor is based on the CLOGP software (from Biobyte,Claremont, Calif.) to estimate the partition coefficient betweenn-octanol and water. Other logP calculators such as Acdlogp (AdvancedChemistry Developments, Toronto, Canada) are included whenever CLOGPfails.

POLAR: This descriptor is a sum of atomic polarizabilities, based on thefollowing empirical polarizability scale: H 0.35; C 1.3; N 1.0; O 0.6; S2.9; P 3.3; F 0.4; Cl 2.3; Br 3.2; I 5.0.

Based on the above filtering properties, a number of molecules that arewithin certain limits are included in a chemical database, for examplean ISIS database. Filtering property limits (i.e. maximum and minimumvariable values) are set by the end-user, and can be modified accordingto the purpose of the navigational tool. For drug-like molecules, onecan for example assume the following limits; maximum M.W. 1500 daltons,LIPO between −10 and +20, maximum FLEXI 50, up to 20 hydrogen bonddonors and acceptors (HBDON/HBACC), up to 6 net ionic charges (QMINUSand QPLUS), and up to 30 rigid centers (RGID). In this illustrativeexample, only S, N, O, P and X were considered as heteroelements(besides C and H).

The choice of core objects is performed according to the initial set offiltering properties, and the following drugs can for example be chosento be diverse, according to the above filter scheme. For example,propranolol, secobarbital, trandolapril, ibuprofen, tolbutamide,caffeine and omeprazole represent a set of pharmaceutically, andchemically distinct set of drugs that may be included in the “core” setof objects.

Satellite objects consist of a list of molecules chosen to representchemical structures that have extreme values according to the filteringproperties. They are analogous to the global positioning satellite (GPS)system, because they are intentionally situated outside the chemicalspace to provide interpolation (not extrapolation) abilities to anymodel derived using these structures. For example, acetyldigitoxin, acardiovascular drug, represents a satellite (M.W.=806.98; SIZE=14;FLEXI=8; RIGID=10; QMINUS=QPLUS=0; HBACC=14; HBDON=4; LIPO=2;POLAR=64.3), in a similar manner to vinblastine. an anti-cancer agent(M.W.=813; SIZE=15; FLEXI=8; RIGID=14; QMINUS=0; QPLUS=2; HBACC=10;HBDON=4; LIPO=3.69; POLAR=90.2). For property extrema notwell-illustrated by drugs, additional compounds can be included, such astetraphenyl-adamantane (M.W.=440.63; SIZE=8; FLXI=4; RIGD=8;QMINUS=QPLUS=HBACC=HBDON=0; LIPO=9.63; POLAR=55.4), and the argininetetrapeptide (M.W.=654.86; SIZE=11; FLEXI=34; RIGID=11; QMINUS=−1;QPLUS=5; HBACC=5; HBDON=16; LIPO=−10; POLAR=67.7). At the other end ofthe property space, compounds such as carbon tetrachloride (M.W.=153.32;SIZE =1; FLEXI=0; RIGID=1; QMINUS=QPLUS=HBACC=HBDON=0; LIPO=2.83;POLAR=10.1), and urea (M.W.=60.05; SIZE=1; FLEXI=0; RIGID=1;QMINUS=QPLUS=0; HBACC=1; HBDON=2; LIPO=−2.11; POLAR=5.3), can also beincluded.

The combination of satellite and core objects, selected using apre-defined filtering scheme, define a convention by which manner tocomprehensively describe any property space. Without this approach,models are limited by their dataset in their ability to properlyrepresent the property space of interest. The presence of satellite andcore objects stabilises the model, and the resulting “scores” areapplicable to all areas of the property space of interest (targethyper-volume), as long as this space is covered by the satellite system.In an analogy to the Global Positioning System (GPS), the presenttechnique allows one to establish where New England is on the world map,in relationship to Sweden and India, whereas other approaches are onlyable to provide local maps of the aforementioned regions. By introducingsatellite objects with an N-dimensional model, we have surprisingly beenable to provide an improved mapping technique for the chemical space.

To further illustrate the versatility of the present technique, we alsointroduce the concept of “anchor” objects. Anchor objects are moleculessituated at the corners of a region of the drug space that is defined byPfizer's “rule of 5”. This rule has been empirically derived by acomputer analysis of known drugs, as described by Christopher A. Pfizerand co-workers in Adv. Drug Delivery Rev., vol. 23, pp. 3-25 (1997). The“rule of 5” is focused on drug permeability and oral absorption, and byintroducing anchor objects, one is capable of using the chemographictool to focus on drug-like molecules that show good pharmacokineticproperties. According to Pfizer's “rule of 5”, LIPO and HBDON arebetween 0 and 5, HBACC is between 0 and 10, and M.W. has a maximum of500. For example, 1,2,4,5 tetrathiane (C2H4S4). (M.W.=153.82; SIZE=1;FLEXI=2; RIGD=1; QMINUS=QPLUS=HBACC=HBDON=0; LIPO=1.02; POLAR=15.84),and bis-methyldisulfanyl-methane (M.W.=172.36; SIZE=2; FLEXI=4;RIGID=QMINUS=QPLUS=HBACC=HBDON=0; LIPO=2.19; POLAR=18.8) are virtualanchor points that constitute representative objects situated in anextreme region, as defined by Pfizer's “rule of 5”.

A method of forming a model in accordance with the present technique isshown in flowchart form in FIGS. 4 to 6. FIG. 4 is a general scheme ofcarrying out the present technique, and includes several stepsexplicitly shown in more detail in FIG. 5 and FIG. 6, respectively. FIG.5 is a detailed description of steps 1 to 6, that are explained below.FIG. 6 is a detailed description of steps 7 to 13, that are alsoexplained below.

The initial steps in the present technique are to choose the propertyspace, with its objects and variables (see FIG. 5). These steps wouldtake the user to a crude model that can be used to map objects in theproperty space. However, steps 7 to 16 are recommended before proceedingto step 17. Steps 7 to 13 of the present technique are to refine theinitial model from steps 1 to 6, as shown in FIG. 6.

The process of the present technique in the drug-like property spacecomprises the following steps:

Step 1

is to choose a set of “core” objects that are representative for thearea of interest, e.g. drug-like molecules.

Step 2

is to identify a set of relevant properties for the core objectsidentified in step 1, for example lipophilicity and size for drug-likecompounds.

Step 3

is to identify variables that, in a meaningful way, describe theproperties that are found to be relevant in step 2, for examplecalculated octanol/water partition coefficient (CLOGP) to describelipophilicity, and calculated molecular refractivity (CMR) to describemolecular size.

Step 4

is to define and/or choose the range of the variables identified in step3. Extreme values are chosen for each variable in this step. Forexample, values between −10 and +20 could be chosen for CLOGP, andvalues between 1 and 30 could be chosen for CMR.

Step 5

is to choose a representative set of satellite objects. Satellites areexisting or hypothetical objects that correspond to extreme values ofthe chosen variables. For example, a large (CMR=20) and very lipophiliccompound (CLOGP=10) could be seen as a satellite in the drug space.

Step 6

is to define an initial set of rules, i.e. an initial set of variablesand an initial set of core and satellite objects.

Step 7

is to refine the set of rules chosen in step 6, if these are notsatisfactory according to conditions described in steps 10a and 10b.

Step 8

is to generate a model based on the dataset constructed in steps 6 and7, by ways of using e.g., principal component analysis (PCA) thatidentifies eigen-vectors within a the property space.

Step 9

is to use the model from step 8 to predict values, e.g. the PCA scores,for the variables chosen in steps 6 and 7 for an external set of objects(one or more target objects to be positioned within the model). Theexternal set typically constitutes of a set of objects that the user mayfind of interest to his particular problem. For example, a set ofchemical reactants (objects) may be submitted to variable prediction toestimate their molecular properties in comparison to drug-likeproperties.

Step 10

consists of two parts: step 10a detects the presence of outlying objectspredicted in step 9. If such outliers are detected, then proceed to step11a. Step 10b evaluates the variable set. This is where theappropriateness of the variable set for the problem of interest isre-evaluated. If the chosen variables are not appropriate, then proceedto step 11b. If neither step 11a or 11b are required, then proceed tostep 12.

Step 11

consists of two parts: step 11a is the process of adding outliers to thedataset, since they are likely to be satellites. If these outliers (orobjects with equivalent variable values) are already present in thatdataset, they should not be included in the dataset since anoverrepresentation of satellites in the model is not recommended (seestep 12). Step 11b is to define additional variables in order to obtaina comprehensive description of the property space defined in step 2. Forexample, polarity may be relevant to drug-like properties, in additionto lipophillicity and size. In this case, new variables describingpolarity (e.g., electronegativity) should be added to the variable set,and representative satellite structures should be included.

Step 12

is performed to balance the composition of the dataset, regarding thenumber of core and satellite objects, due to statistical reasons. Forexample, an overrepresentation of either type of objects may lead to askewed final map (see step 17).

Step 13

is to make (or remake) the model, according to the existing set of rules(objects and variables), by way of using, e.g., PCA.

Step 14

is reached when the model has already been refined, and a satisfactoryset of objects with corresponding relevant variables, have beenselected.

Step 15

is to use the refined model from step 14 to predict variable values,e.g. the PCA scores, for a new external set of objects, or for the sameset of objects as in step 9.

Step 16

consists of looking for outlying objects. If, at this stage, outliersare detected, then step 11a should be performed, and the process shouldbe reiterated from step 7. Otherwise, proceed to step 17.

Step 17

is the construction of the property space map, using, e.g., PCA scoreplots. The map is ready to use for navigating in that particular space.

Step 18

is the use of the map obtained in Step 17 to assign map scores (orco-ordinates) for new objects (target objects). This step includesestimation of object properties via computer calculation of descriptorsand subsequent PCA. Score values, that can be displayed as score plots,are then obtained for each significant latent property via prediction.Each score value is obtained using prediction based on the convergedmodel (i.e. using the same evaluation techniques).

The present technique can be used to generate a map of themultidimensional chemical space that allows one to examine, in aconsistent and conventional manner, the inner relationship betweenvarious molecules. The present technique can therefore be used for thefollowing:

Use for selection of reactants and molecular diversity analysis, priorto combinatorial chemistry and/or parallel synthesis.

Use for compound (product) clustering and evaluation of molecularsimilarity, prior to structure-activity relationship studies.

Use for database analysis and comparison of chemical information.

Use for defining subregions of the property space, via “anchor” objects,that can be used to visualize the same region.

Use of the map of the chemical space to assign molecular similarityand/or diversity.

Use of the map to predict the probability of a given structure to havegood oral absorption if administered per ostium.

Use of map co-ordinates as descriptors for QSAR, QSPR and othermathematical models relating chemical structure to macroscopicproperties.

Use of map co-ordinates as unique identifiers for storing molecules in achemical database, e.g. similar to the CAS number or to the uniqueSMILES string, but having the advantage that it incorporates informationrelating to chemical properties.

Use of the ChemGPS system to predict drug-related properties for novelcompounds.

Use of the ChemGPS system for discriminating “drug-like” vs.“non-drug-like”.

Some possibilities uses foreseen for the described system, will beexemplified by its performance on rigid heterocyclic compounds incomparison to GRID/GOLPE approach as presented by Clementi et al.,Quant. Struct.—Act. Relat. Vol. 15, pp. 108-120 (1996) and as firstdiscussed below.

Reference should be made to the examples described Clementi et al.,Quant. Struct.—Act. Relat. Vol. 15, pp. 103-120 (1996).

GRID is a program that calculates interaction energies of given probeswith the molecules.

The following GRAD probes were used:

N1 (neutral flat NH) as in main chain amide

N:=(sp2 N with lone pair) as in Triptophan, Histidine

N1+(sp3 amine NH cation) as in Arginine, Lysine, Histidine

OH (sp2 hydroxy group) as in Tyrosine

O (sp2 carbonyl oxygen) as in main chain amide

O1 (sp3 hydroxy group) as in Serine, Threonine

COO— (carboxylate) as in Asparate, Glutamate

CONH2 (amide group) as in Asparagine, Glutanmine

Amidine as in Arginine

Additional volumes and surfaces were calculated for hydrophobic andhydrophilic probes (4 descriptors).

Four principal properties were obtained using these calculations.Guanine was excluded as an outlier, and for the rest, the results areinterpreted below:

PP1: negative means hydrophobe, positive means hydrophile (40%contribution);

PP2: describes the H-bond ability, and differentiates acceptors (azoles,azines) from slight donors (diazoles, pyridones) (16% contribution)

PP3: separates shape & hidrophobicity: monocyclic (negative) andbicyclic (positive). The carboxylate probe determines the relativepositions of the compounds (16% contribution);

PP4: mulitple amidine interactions; differentiates chalcogen [O, S]containing systems from those with nitrogen (10% contribution).

This lead to 10 groups of heteroaromatic compounds. Best representativesare given below: 1A: pyrrole; 1B: thiophen; 1C: indole; 1D:benzothiophene; 2E: pyridine; 3G: imidazole; 4F: quinoline; 4H:benzimidazole; 5I: uracil; 5J: purine

Note: Aniline, benzene, phenol, naphthalene and thiazole were notincluded in the clustering scheme.

Clementi et al., Quant. Struct.—Act. Relat. Vol. 15, pp. 108-120 (1996),have thus calculated several 3-D interactions using different propertyprobes, i.e. calculating the energy interaction from every grid-pointbetween a methyl or NH4+ etc. moiety (the probe) for every training setmolecule for every grid-point with e.g. 0.5 Å resolution (the GRIDapproach by P. Goodford). The resulting data-matrices, containing >50000 data-points, were then subjected to chemometric calculations similarto those performed by us. By doing this they created a local model thatwell describes the chemical space of rigid heterocyclic compounds (n=45for their training set).

Example of ChemGPS

We initially performed database searches in “comprehensive medicinalchemistry”, a publicly available database, and identified an initial setof 128 molecules. These chemicals were substances registered for medicaluse and were defined as the initial set of “core” objects (Step 1).

The following properties were identified as being relevant for thesedrugs: mass, size, lipophilicity, charge, flexibility, hydrogen-bonddonor and acceptor ability, and polarizability (Step 2).

The following variables were identified (Step 3):

M.W.: This descriptor is the molecular weight of each compound,expressed in daltons.

SIZE: This descriptor is a rough estimate of molecular size, and adds 1point for every four non-hydrogen atoms. This is based on the roughestimate that a 500 dalton compound has 40 heavy atoms. This complementsM.W., since third (or higher) period elements contribute significantlyto M.W., without a similar increase in the size of the molecule.

FLEX: This is a count of all nonterminal rotatable bonds and/orrepeating units such as (C═C—C). This descriptor also adds “n” for eachnon-rigid ring with n=N−4−x/2—where

N is the number of bonds in a ring, x is nr of atoms in other rings(spiro counts as 1). Amides and esters are ignored.

RIGID: This is a count of all rigid structures such as rings, amides,ester groups, carboxylates, nitro groups, etc. Additionally, 1 point isgiven for each fused ring.

QMINUS: This is a count of all formal negative charges only: carboxyls,sulfates, phosphates, etc, and all other groups that can be deprotonatedat physiological pH.

QPLUS: This is a count of all formal positive charges only: amines,amidines, guanidines, etc., and all other groups that can be protonatedat physiological pH.

HBACC: This is a count of all oxygens & nitroges that can accepthydrogen bonds.

HBDON: Counts O—H and N—H moieties only. Because carboxyls are ionized,they are ignored.

LIPO: This descriptor is based on the CLOGP software (from Biobyte,Claremont, Calif.) to estimate the partition coefficient between octanoland water. Other LogP calculators such as ACDLOGP (Advanced ChemistryDevelopments, Toronto, Canada) are included whenever CLOGP fails.

POLAR: This descriptor is a sum of atomic polarizabilities, based on thefollowing empirical polarizability scale: H 0.35; C 1.3; N 1.0; O 0.6; S2.9; F 0.4; Cl 2.3; Br 3.2; I 5.0.

Based on this initial choice, extreme values were defined, for eachvariable, as shown in the table below (Step 4):

Variable Maximum Minimum M.W 822 60 FLEXI 34 0 SIZE 14 1 RIGID 12 0QMINUS 0 −4 QPLUS 5 0 HBACC 14 0 HBDON 16 0 LIPO 10 −10 POLAR 120 5.2

Having defined extreme values for the above variables, a number of 61existing or hypothetical compounds were assigned as satellite objects(Step 5). thus, an initial set of 189 objects with 10 calculatedvariable datapoints was defined (step 6). This was the initial set ofrules.

The set of rules chosen in step 6 was not refined (step 7).

The dataset described in steps 6 and 7 was subjected to PCA, renderingan initial model with three significant (by cross-validation) principalcomponents (Step 8).

The 45 aromatic structures described by Clementi et al. [1] werepredicted (step 9).

We did not find any outliers (STEP 10a). However, a better descriptionof the property space was required (Step 10b).

Therefore, the descriptor set was refined (step 11b), and included 62descriptors in the final analysis (before Step 12 was performed). Amongthese descriptors, some were topological indices, some were related toelectrostatic properties, and some were related to hydrogen bondacidity/basicity. All descriptors were calculated using commerciallyavailable software.

At this stage, the model included 189 structures, out of which only 61were satellites. The internal structure of our dataset was consideredimbalanced. Therefore, a set of 48 additional satellites were included(step 12).

The model was redefined using PCA, for a set of 237 compounds and 62descriptors, providing eight significant principal components (Step 13).

This model was found satisfactory (Step 14), as no structures wereoutliers according to outlier detection methods, e.g. DmodX (distance toModel X) and PCA score distribution.

Variable score values for the 45 aromatic structures were predicted bythe model produced in step 13 (step 15).

No outliers were observed in the external dataset (Step 16).

Therefore, final score plots were produced for the first four principalcomponents. These were used for navigation in chemical space. For thepurpose of illustration, we compared our score plots to those producedby Clementi et al.

We found that we could position those heterocyclic compounds in anappropriate part of the chemical space (FIG. 2) and also see informativedistribution on the local level (FIG. 1) that compares to the score plotbased on original data by Clementi et al. (FIG. 3).

FIG. 2 shows the positioning of the heterocyclic compounds described byClementi et al, using the suggested global model system. Some satellitesare indicated, i.e. S1-S4. Note that S3 and S4 are positioned out of thescale (co-ordinates for abscissa and ordinate as indicated inparentheses). S1, Carbon tetra-chloride; S2, Tetraphenyladamate; S3,Vinblastine; S4, Tetra-Arginine.

FIG. 1 shows a close-up on FIG. 2 for investigation of local resolutionusing the present system.

FIG. 3 shows the positioning of 45 heterocyclic compounds by theapproach used by Clementi et al, reproduced by us.

As seen in FIG. 2 the included heterocyclic compounds are positionedclose to carbon tetra chloride (satellite S1) in the global chemicalspace.

The results show that the global model produced using the techniquedescribed herein is consistent with the previous local model and yet hasmuch more general applicability.

Extreme values may be defined as discussed above in relation to Step 4.A more broadly based model may be produced, if desired, using thefollowing alternative extreme values.

Variable Maximum Minimum M.W. 1500 30 FLEXI 50 0 SIZE 50 0 RIGID 30 0 HBACC 35 0 HB DON 25 0 POLAR 150 0

As an alternative to determining the SIZE value as discussed above, ameasure of this may be made by determining the “calculated molecularrefractivity, CMR”, such as by the Daylight CIS software. In this casesuitable extreme values are 35 and 0. A further alternative would be tocalculate the molecular volume (MVOL) of a molecule by summing theconsituent atomic volumes derived using the van der Waals radius of eachatom. In this case suitable extreme values would be 2000 and 20 cubicangstoms.

FIG. 7 schematically illustrates a computer system of the type that maybe used to implement the modelling technique described above. In generalterms, the computer system 4 contains a central processing unit 6, aworking memory 8, a non-volotile memory 10 (such as a hard disc drive),a user input device 12 (such as a keyboard and mouse), a user outputdevice 14 (such as a computer monitor or printer) and a carrier mediumreader 16 (such as a CD Rom drive, floppy disc drive ortelecommunications connection). The central processing unit 6 executesprogram instructions loaded from the non-volatile memory 10 into theworking memory 8. These program instructions define the processing stepsfor performing the modelling technique described above. The non-volatilememory 10 will also include programs for running the various tools fordetermining properties of the target compounds of the type discussedpreviously. Also stored within the non-volatile memory 10 is the coreobject data and the satellite object data that serve to form theN-dimensional model.

A user of the computer system 4 will manipulate the user input device 12to initiate the running of the modelling program and will inputparameters describing the target object to be modelled. In a typicalexample a library of target objects of potential interest for screeningmay be input to the model such that their similarities and differencesto existing known pharmaceuticals can be studied before a decision as towhether or not to screen those compounds for biological activity ismade. When the target objects have been positioned within the modelincorporating the core objects and the satellite objects as describedabove, a user output may be generated using the user output device 14 todisplay the relative positions of the target objects within the targethyper-volume. The displays may typically be projections into various2-dimensional planes from the N-dimensional model.

The computer program that performs the model will typically be formed ofcontrol instructions together with the associated core data andsatellite data. These items may be loaded into the non-volatile memoryvia the carrier medium reader 16. A commercial product embodying themodel of the invention may comprise a carrier medium sold for use by auser having first loaded that carrier medium into their computer system4 via the carrier medium reader 16.

FIG. 8 schematically illustrates a comparison between the local modeland the global model approach. Plot A represents a 1-dimensional localmodel containing four known objects signified by the solid dots. Atarget object to be positioned within that model is illustrated by thedashed circle. Providing that target object is of a generally similartype to the core objects within the 1-dimensional model, then areasonable representation of its relationship to the other objects maybe obtained by calculating its position within that 1-dimensional model.

It will be appreciated that a 1-dimensional model has been shown forease of representation. In practice even a simple local model willtypically include more than 3 dimensions and so be difficult todiagramatically illustrate.

Plot B illustrates the situation with a local model when one wishes toposition within it a target object which is really too different fromthe other objects within the local model. In this case, a projectioninto the 1-dimensional model may be made, but this will tend to give aninaccurate representation of the similarity or differences of thattarget object from the other objects within that model. The local modeldoes not properly span the property space and accordingly theprojections made can result in misleading interpretations.

Plot C illustrates a 2-dimensional model following a global modelapproach. In this global model a target hyper-volume of interest 18 isformed within the 2-dimensional space and contains core objectsillustrated by the solid dots. A number of satellite objects illustratedby crosses lie outside of the target hyper-volume. The inclusion ofthese satellite objects within the model has given the model a greaterspan of the property space. Accordingly, in this case when the targetobject is to be mapped into the target hyper-volume 18, a morerepresentative relative position to the core objects is determined. Thisleads to a better understanding of the relationship between the targetobjects and the core objects and improves the general applicability andusefulness of the model.

A set of objects that can be used to form a model of chemical space isgiven in the attached Annex.

What is claimed is:
 1. A method of mapping a chemical structure of atarget type into a target hyper-volume within a model in N-dimensionalspace comprising a plurality of chemical structures, each chemicalstructure in said model having an associated set of chemical variablesdefining its position within said N-dimensional space, each chemicalvariable having a maximum and minimum value within said model, saidmethod comprising the steps of: a) storing core chemical structure datarepresenting a plurality of core chemical structures of said target typewithin said target hyper-volume, said target hyper-volume beingpositioned away from said maximum and minimum values of said chemicalvariables; b) storing satellite chemical structure data representing aplurality of satellite chemical structures not of said target typepositioned outside of said target hyper-volume; c) determining fromcharacteristics of said target chemical structure a position of saidtarget chemical structure within said hyper-volume using the sameevaluation criteria as used for said core chemical structures and saidsatellite chemical structures; d) positioning said target chemicalstructure within said model relative to said core chemical structuresand said satellite chemical structures in accordance with saiddetermined position; and e) generating a user output indicative of saidrelative position of said target chemical structure.
 2. The method asclaimed in claim 1, wherein said target type is a pharmaceuticallyactive chemical structure and said core chemical structure is apharmaceutical.
 3. The method as claimed in claim 2, wherein saidsatellite chemical structures are pharmaceutically inactive.
 4. Themethod as claimed in claim 1, wherein said chemical variables of saidtarget chemical structure are subjected to principal component analysisto determine the position of the target chemical structure within saidhyper-volume.
 5. The method as claimed in claim 1, wherein chemicalvariables of said core chemical structures and said satellite chemicalstructures are subjected to principal component analysis to determineeigen-vectors that serve as axes of said N-dimensional space.
 6. Themethod as claimed in claim 1, wherein if said target chemical structureis found to lie outside of said target hyper-volume, then datarepresenting said target chemical structure may be added to saidsatellite chemical structure data.
 7. The method as claimed in claim 1,wherein said set of chemical variables comprises a variable representingmolecular weight.
 8. The method as claimed in claim 7, wherein all ofsaid chemical structures within said model have a molecular weight ofbetween 1500 and
 30. 9. The method as claimed in claim 7, wherein all ofsaid chemical structures within said model have a molecular weight ofbetween 822 and
 60. 10. The method as claimed in claim 1, wherein saidset of chemical variables comprises a variable representing molecularsize.
 11. The method as claimed in claim 10, wherein said variablerepresenting molecular size has a value equivalent to 1 point for every4 non-hydrogen atoms.
 12. The method as claimed in claim 11, wherein allof said chemical structures within said model have a molecular sizevalue of between 50 and
 0. 13. The method as claimed in claim 11,wherein all of said chemical structures within said model have amolecular size value of between 14 and
 1. 14. The method as claimed inclaim 1, wherein said set of chemical variables comprises a variablerepresenting molecular flexibility.
 15. The method as claimed in claim14, wherein said variable representing molecular flexibility is a countrepresenting non-terminal rotatable bonds, repeating units and ringswithin a molecule.
 16. The method as claimed in claim 13, wherein all ofsaid chemical structures within said model have a variable representingmolecular flexibility of between 50 and
 0. 17. The method as claimed inclaim 15, wherein all of said chemical structures within said model havea variable representing molecular flexibility of between 34 and
 0. 18.The method as claimed in claim 1, wherein said set of chemical variablescomprises a variable representing molecular rigidity.
 19. The method asclaimed in claim 18, wherein said variable representing molecularrigidity is a count of all rigid structures within a molecule.
 20. Themethod as claimed in claim 19, wherein all of said chemical structureswithin said model have a variable representing molecular rigidity ofbetween 30 and
 0. 21. The method as claimed in claim 19, wherein all ofsaid chemical structures within said model have a variable representingmolecular rigidity of between 12 and
 0. 22. The method as claimed inclaim 1, wherein said set of chemical variables comprises a variablerepresenting the number of formal negative charges on a molecule. 23.The method as claimed in claim 22, wherein all of said chemicalstructures within said model have a variable representing the number offormal negative charges of between 0 and −4.
 24. The method as claimedin claim 1, wherein said set of chemical variables comprises a variablerepresenting the number of formal positive charges on a molecule. 25.The method as claimed in claim 24, wherein all of said chemicalstructures within said model have a variable representing the number offormal positive charges of between 5 and
 0. 26. The method as claimed inclaim 1, wherein said set of chemical variables comprises a variablerepresenting the ability of a molecule to accept hydrogen bonds.
 27. Themethod as claimed in claim 26, wherein said variable representing theability of a molecule to accept hydrogen bonds is a count of the numberof oxygen and nitrogen atoms within a molecule that can accept ahydrogen bond.
 28. The method as claimed in claim 27, wherein all ofsaid chemical structures within said model have a variable representingthe ability of a molecule to accept hydrogen bonds of between 35 and 0.29. The method as claimed in claim 27, wherein all of said chemicalstructures within said model have a variable representing the ability ofa molecule to accept hydrogen bonds of between 14 and
 0. 30. The methodas claimed in claim 1, wherein said set of chemical variables comprisesa variable representing the ability of a molecule to donate hydrogenbonds.
 31. The method as claimed in claim 30, wherein said variablerepresenting the ability of a molecule to donate hydrogen bonds is acount of the number of O—H and N—H moieties within a molecule that candonate a hydrogen to form a hydrogen bond.
 32. The method as claimed inclaim 31, wherein all of said chemical structures within said model havea variable representing the ability of a molecule to donate hydrogenbonds of between 25 and
 0. 33. The method as claimed in claim 31,wherein all of said chemical structures within said model have avariable representing the ability of a molecule to donate hydrogen bondsof between 16 and
 0. 34. The method as claimed in claim 1, wherein saidset of chemical variables comprises a variable representing thelipophilicity of a molecule.
 35. The method as claimed in claim 34,wherein said variable representing lipophilicity represents thepartition coefficient for said molecule between octanol and water. 36.The method as claimed in claim 35, wherein all of said chemicalstructures within said model have a variable representing lipophilicityof between 10 and −10.
 37. The method as claimed in claim 1, whereinsaid set of chemical variables comprises a variable representing the sumof atomic polarizabilities within a molecule.
 38. The method as claimedin claim 37, wherein all of said chemical structures within said modelhave a variable representing the sum of atomic polarizabilities ofbetween 150 and
 0. 39. The method as claimed in claim 37, wherein all ofsaid chemical structures within said model have a variable representingthe sum of atomic polarizabilities of between 120 and 5.2.
 40. Themethod as claimed in claim 1, wherein said set of chemical variablescomprises a variable representing calculated molecular refractivity. 41.The method as claimed in claim 40, wherein all of said chemicalstructures within said model have a calculated molecular refractivity ofbetween 35 and
 0. 42. The method as claimed in claim 1, wherein said setof chemical variables comprises a variable representing molecularvolume.
 43. The method as claimed in claim 42, wherein all of saidchemical structures within said model have a molecular volume of between2000 and 20 cubic Angstroms.
 44. The method as claimed in claim 1,wherein said target hyper-volume is defined by Pfizer's “Rule of 5”. 45.The method as claimed in claim 1, wherein one or more anchor chemicalstructures are situated at the periphery of said target hyper-volume.46. The method as claimed in claim 45, wherein said anchor chemicalstructure is a pharmaceutical.
 47. A computer apparatus comprising aprogram for performing the method steps of any of claims 1-46.
 48. Anapparatus for mapping a target chemical structure of a target type intoa target hyper-volume within a model in N-dimensional space comprising aplurality of chemical structures, each chemical structure in said modelhaving an associated set of chemical variables defining its positionwithin said N-dimensional space, each chemical variable having a maximumand minimum value within said model, said apparatus comprising: a) amemory operable to store core chemical structure data representing aplurality of core chemical structures of said target type within saidtarget hyper-volume, said target hyper-volume being positioned away fromsaid maximum and minimum values of said chemical variables and operableto store satellite chemical structure data representing a plurality ofsatellite chemical structures not of said target type positioned outsideof said target hyper-volume; b) determination logic operable todetermine from characteristics of said target chemical structure aposition of said target chemical structure within said hyper-volumeusing the same evaluation criteria as used for said core chemicalstructures and said satellite chemical structures; c) positioning logicoperable to position said target chemical structure within said modelrelative to said core chemical structures and said satellite chemicalstructures in accordance with said determined position; and d) a useroutput device for generating a user output indicative of said relativeposition of said target chemical structure.
 49. A method of forming amodel in N-dimensional space comprising a plurality of chemicalstructures and a target hyper-volume into which target chemicalstructures are to be mapped, said method comprising the steps of: a)selecting a set of chemical variables defining said N-dimensional space;b) selecting maximum and minimum values for said chemical variables; c)selecting a representative set of core chemical structures within saidtarget hyper-volume; d) selecting a representative set of satellitechemical structures outside of said target hyper-volume; and e)iteratively testing and altering said model to obtain a set of chemicalvariables, maximum and minimum values, core chemical structures andsatellite chemical structures that span said N-dimensional space andthat allow target chemical structures to be mapped to within said targethyper-volume.